BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark.
نویسندگان
چکیده
Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site (http://www-bio3d-igbmc.u-strasbg.fr/balibase) has been completely redesigned to provide a more user-friendly, interactive interface for the visualization of the BAliBASE reference alignments and the associated annotations.
منابع مشابه
BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs
SUMMARY BAliBASE is a database of manually refined multiple sequence alignments categorized by core blocks of conservation sequence length, similarity, and the presence of insertions and N/C-terminal extensions. AVAILABILITY From http://www-igbmc. u-strasbg.fr/BioInfo/BAliBASE/index.html
متن کاملSome remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark
BAliBASE is one of the most widely used benchmarks for multiple sequence alignment programs. The accuracy of alignment methods is measured by bali score—an application provided together with the database. The standard accuracy measures are the Sum of Pairs (SP) and the Total Column (TC). We have found that, for non-core block columns, results calculated by bali score are different from those ob...
متن کاملEvaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set
MOTIVATION SAM-T99 is an iterative hidden Markov model-based method for finding proteins similar to a single target sequence and aligning them. One of its main uses is to produce multiple alignments of homologs of the target sequence. Previous tests of SAM-T99 and its predecessors have concentrated on the quality of the searches performed, not on the quality of the multiple alignment. In this p...
متن کاملProbCons: Probabilistic consistency-based multiple sequence alignment.
To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce probabilistic consist...
متن کاملTSGA-MSA: Trace Sequence Algorithm for Alignment of MSA
Multiple sequence alignment (MSA) is an NP-complete and important problem in bioinformatics. In this paper, we have proposed iterative alignment method using a Genetic Algorithm for Multiple Sequence Alignment, named TSGA-MSA. The steps in this algorithm are discussed in details and its performances on a set of benchmark datasets from the BAliBase 2. 0 are analysed. The experimental results, th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proteins
دوره 61 1 شماره
صفحات -
تاریخ انتشار 2005